Goto

Collaborating Authors

 error exponent




Nonzero-sum Adversarial Hypothesis Testing Games

Sarath Yasodharan, Patrick Loiseau

Neural Information Processing Systems

We study nonzero-sum hypothesis testing games that arise in the context of adversarial classification, in both the Bayesian as well as the Neyman-Pearson frameworks. We first show that these games admit mixed strategy Nash equilibria, and then we examine some interesting concentration phenomena of these equilibria.



Secure Best Arm Identification in the Presence of a Copycat

Cohen, Asaf, Günlü, Onur

arXiv.org Artificial Intelligence

Consider the problem of best arm identification with a security constraint. Specifically, assume a setup of stochastic linear bandits with $K$ arms of dimension $d$. In each arm pull, the player receives a reward that is the sum of the dot product of the arm with an unknown parameter vector and independent noise. The player's goal is to identify the best arm after $T$ arm pulls. Moreover, assume a copycat Chloe is observing the arm pulls. The player wishes to keep Chloe ignorant of the best arm. While a minimax--optimal algorithm identifies the best arm with an $Ω\left(\frac{T}{\log(d)}\right)$ error exponent, it easily reveals its best-arm estimate to an outside observer, as the best arms are played more frequently. A naive secure algorithm that plays all arms equally results in an $Ω\left(\frac{T}{d}\right)$ exponent. In this paper, we propose a secure algorithm that plays with \emph{coded arms}. The algorithm does not require any key or cryptographic primitives, yet achieves an $Ω\left(\frac{T}{\log^2(d)}\right)$ exponent while revealing almost no information on the best arm.


Statistical Mean Estimation with Coded Relayed Observations

Ling, Yan Hao, Yang, Zhouhao, Scarlett, Jonathan

arXiv.org Artificial Intelligence

We consider a problem of statistical mean estimation in which the samples are not observed directly, but are instead observed by a relay (``teacher'') that transmits information through a memoryless channel to the decoder (``student''), who then produces the final estimate. We consider the minimax estimation error in the large deviations regime, and establish achievable error exponents that are tight in broad regimes of the estimation accuracy and channel quality. In contrast, two natural baseline methods are shown to yield strictly suboptimal error exponents. We initially focus on Bernoulli sources and binary symmetric channels, and then generalize to sub-Gaussian and heavy-tailed settings along with arbitrary discrete memoryless channels.


Towards minimax optimal algorithms for Active Simple Hypothesis Testing

Vijayan, Sushant

arXiv.org Machine Learning

We study the problem of Active Simple Hypothesis Testing (ASHT) whe re an agent is faced with the problem of choosing between m different simple hypotheses after observing T samples. At the end of T samples, it has to output one of the m hypothesis. The distinguishing difference from the usual hypothes is testing scenario is the ability to choose one of K actions and observe the corresponding sample for that action. Th is ability to control the samples in this way makes the problem more interesting and difficult compared to the usual hypothesis testing with no control over the sample generation. The performance of the agent is meas ured in terms of the error probability its decision incurs. The above theoretical problem is a model for many practica l scenarios-A cosmetic drug trial often involve a testing period where the outcome of interest is to identify the best product after the trial period, choosing a channel from a set of channels before commencing communications, placeme nt of sensors in certain set of positions so as to minimize signal error. Any situation which require a period of testing b efore committing to a final decision with only certain fixed budget of samples (that is an inability to request additio nal samples) can be modeled effectively using ASHT and its more general version - Fixed Budget Best Arm Identific ation (FB-BAI). We intend to study the ASHT problem in the large deviation setting with the quantity of interest being the minimax error exponent over the hypotheses, that is, the worst case er ror exponent over the hypotheses.


The Query/Hit Model for Sequential Hypothesis Testing

Shariatnasab, Mahshad, Rini, Stefano, Shirani, Farhad, Iyengar, S. Sitharama

arXiv.org Artificial Intelligence

This work introduces the Query/Hit (Q/H) learning model. The setup consists of two agents. One agent, Alice, has access to a streaming source, while the other, Bob, does not have direct access to the source. Communication occurs through sequential Q/H pairs: Bob sends a sequence of source symbols (queries), and Alice responds with the waiting time until each query appears in the source stream (hits). This model is motivated by scenarios with communication, computation, and privacy constraints that limit real-time access to the source. The error exponent for sequential hypothesis testing under the Q/H model is characterized, and a querying strategy, the Dynamic Scout-Sentinel Algorithm (DSSA), is proposed. The strategy employs a mutual information neural estimator to compute the error exponent associated with each query and to select the query with the highest efficiency. Extensive empirical evaluations on both synthetic and real-world datasets -- including mouse movement trajectories, typesetting patterns, and touch-based user interactions -- are provided to evaluate the performance of the proposed strategy in comparison with baselines, in terms of probability of error, query choice, and time-to-detection.


Exponentially Consistent Statistical Classification of Continuous Sequences with Distribution Uncertainty

Zhu, Lina, Zhou, Lin

arXiv.org Machine Learning

In multiple classification, one aims to determine whether a testing sequence is generated from the same distribution as one of the M training sequences or not. Unlike most of existing studies that focus on discrete-valued sequences with perfect distribution match, we study multiple classification for continuous sequences with distribution uncertainty, where the generating distributions of the testing and training sequences deviate even under the true hypothesis. In particular, we propose distribution free tests and prove that the error probabilities of our tests decay exponentially fast for three different test designs: fixed-length, sequential, and two-phase tests. We first consider the simple case without the null hypothesis, where the testing sequence is known to be generated from a distribution close to the generating distribution of one of the training sequences. Subsequently, we generalize our results to a more general case with the null hypothesis by allowing the testing sequence to be generated from a distribution that is vastly different from the generating distributions of all training sequences.


Error Exponent in Agnostic PAC Learning

Hendel, Adi, Feder, Meir

arXiv.org Machine Learning

Statistical learning theory and the Probably Approximately Correct (PAC) criterion are the common approach to mathematical learning theory. PAC is widely used to analyze learning problems and algorithms, and have been studied thoroughly. Uniform worst case bounds on the convergence rate have been well established using, e.g., VC theory or Radamacher complexity. However, in a typical scenario the performance could be much better. In this paper, we consider PAC learning using a somewhat different tradeoff, the error exponent - a well established analysis method in Information Theory - which describes the exponential behavior of the probability that the risk will exceed a certain threshold as function of the sample size. We focus on binary classification and find, under some stability assumptions, an improved distribution dependent error exponent for a wide range of problems, establishing the exponential behavior of the PAC error probability in agnostic learning. Interestingly, under these assumptions, agnostic learning may have the same error exponent as realizable learning. The error exponent criterion can be applied to analyze knowledge distillation, a problem that so far lacks a theoretical analysis.